Metastability in electronics

Metastability in electronics is the ability of a digital electronic system to persist for an unbounded time in an unstable equilibrium or metastable state.[1] In metastable states, the circuit may be unable to settle into a stable '0' or '1' logic level within the time required for proper circuit operation. As a result, the circuit can act in unpredictable ways, and may lead to a system failure.

Metastable states are inherent features of asynchronous digital systems, and of systems with more than one independent clock domain. In self-timed asynchronous systems, arbiters are designed to allow the system to proceed only after the metastability has resolved, so the metastability is a normal condition, not an error condition.[2] In synchronous systems with asynchronous inputs, synchronizers are designed to make the probability of a synchronization failure acceptably small.[3] Metastable states are avoidable in fully synchronous systems when the input setup and hold time requirements on flip-flops are satisfied.

Contents

Flip-flops

In electronics, the flip-flop is a device that is susceptible to metastability. It has two well-defined stable states, traditionally designated 0 and 1, but under certain conditions (see below) it can hover between them for longer than a clock cycle. This condition is known as metastability. Such a metastable "state" is considered a failure mode of the logic design and timing philosophy or implementation.

The most common cause of metastability is violating the flip-flop's setup and hold times. During the time from the setup to the hold time (capture window), the data input of the flip-flop should remain in a stable logic state; a change of the data input in that time will have a probability of setting the flip-flop to a metastable state.

In a typical scenario where data travels from the output of a source flip-flop to the input of target flip-flop, metastability is caused by either:

  1. the target clock having a different frequency than the source flip-flop, in which case the setup and hold time of the target flip-flop will be violated eventually, or
  2. the target and source clock having the same frequency, but a phase alignment that causes the data to arrive at the target flip-flop during its setup and hold time. This can be caused by fixed overhead or variations in logic delay times on the worst case path between the two flip flops, variations in clock arrival times (clock skew), or other causes.[4][5]

Arbiters

In electronics, an arbiter is a circuit designed to determine which of several signals arrive first. Arbiters are used in asynchronous circuits to order computational activities for shared resources to prevent concurrent incorrect operations. Arbiters are used on the inputs of fully synchronous systems, and also between clock domains, as synchronizers for input signals. Although they can minimize the occurrence of metastability to very low probabilities, all arbiters nevertheless have metastable states, which are unavoidable at the boundaries of regions of the input state space resulting in different outputs.[6]

Synchronous circuits

Synchronous circuit design techniques make digital circuits that are resistant to the failure modes that can be caused by metastability. A clock domain is defined as a group of flip flops with a common clock. Such architectures can form a circuit guaranteed free of metastability (below a certain maximum clock frequency, above which first metastability, then outright failure occur). However, even then, if the system has a dependence on any continuous inputs then these are likely to be vulnerable to metastable states.[7]

When synchronous design techniques are used, protection against metastable events causing systems failures need only be provided when transferring data between different clock domains or from an unclocked region into the synchronous system. This protection can often take the form of a series of delay flip flops which delay the data stream long enough for the metastability to have statistically been removed.

Failure modes

Although metastability is well understood and architectural techniques to control it are known, it persists as a failure mode in equipment.

Serious computer and digital hardware bugs caused by metastability have a fascinating social history. Many engineers have refused to believe that a bistable device can enter into a state that is neither true nor false and has a positive probability that it will remain indefinite for any given period of time, albeit with exponentially decreasing probability over time. However, metastability is an inevitable result of any attempt to map a continuous domain to a discrete one. There will always be points in the continuous domain which are equidistant (or nearly so) from the points of the discrete domain, making a decision as to which discrete point to select a difficult and potentially lengthy process.[8] If the inputs to an arbiter or flip-flop arrive almost simultaneously, the circuit most likely will traverse a point of metastability. Metastability remains poorly understood in some circles, and various engineers have proposed their own circuits said to solve or filter out the metastability; typically these circuits simply shift the occurrence of metastability from one place to another.[9] Chips using multiple clock sources are often tested with tester clocks that have fixed phase relationships, not the independent clocks drifting past each other that will be experienced during operation. This usually explicitly prevents the metastable failure mode that will occur in the field from being seen or reported. Current engineering solutions to this problem are often the well-characterized, multi-stage common-clock shift registers discussed in the links below.

See also

References

  1. ^ Thomas J. Chaney and Charles E. Molnar (April 1973). "Anomalous Behavior of Synchronizer and Arbiter Circuits". IEEE Transactions on Computers C-22 (4): 421–422. doi:10.1109/T-C.1973.223730. ISSN 0018-9340. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1672323. 
  2. ^ John Bainbridge (2002). Asynchronous system-on-chip interconnect. Springer. p. 18. ISBN 9781852335984. http://books.google.com/books?id=6cWoGRfz_k0C&pg=PA18&dq=metastable+alternative-approach+gain–bandwidth-product&hl=en&ei=pmgATYH1IY30tgPkmdSvCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCMQ6AEwAA#v=onepage&q=metastable%20alternative-approach%20gain–bandwidth-product&f=false. 
  3. ^ Thomas J. Chaney (February 1974). "Reprint of Technical Memorandum No. 10, "The Glitch Phenomenon" (1966)". The Synchronizer 'Glitch' Problem. St. Louis, Missouri: Washington University. Appendix A. http://www.sunlabs.com/vlsi/pubs/p1vol4macrom.pdf. 
  4. ^ Interfacing Two Clock Domains, ASIC World
  5. ^ Chuck Benz Fifos and Ring Buffers
  6. ^ Richard F. Tinder (2009). Asynchronous sequential machine design and analysis: a comprehensive development of the design and analysis of clock-independent state machines and systems. Morgan & Claypool Publishers. p. 165. ISBN 9781598296891. http://books.google.com/books?id=wtvKY5U2Ud4C&pg=PA165. 
  7. ^ Kleeman, L. and Cantoni, A. "Metastable Behavior in Digital Systems" December 1987 IEEE Design & Test of Computers, 4(6):4-19 doi:10.1109/MDT.1987.295189
  8. ^ Leslie Lamport (December 1984). "Buridan's Principle". http://research.microsoft.com/users/lamport/pubs/buridan.pdf. Retrieved 2010-07-09. 
  9. ^ Ran Ginosar. "Fourteen Ways to Fool Your Synchronizer" ASYNC 2003.

External links